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Background of the Invention 

Field of the Invention 

[0001] The present invention is related to video and image coding and in 
particular to systems and methods for coding video image information in a compressed and 
error resilient manner. 
Description of the Related Art 

[0002] MPEG is an ISO/IEC standard developed by MPEG (Moving Picture 
Experts Group). There are several versions of the MPEG standard, such as MPEG-1, MPEG- 
2, MPEG-4, and MPEG-7, and they are intended to standardize certain aspects of image and 
audio compression. As with other forms of video compression, such as H.261, H.262, H.263, 
H.263+, H.263++, H.26L, MPEG compression attempts to eliminate redundant or urelevant 



data. For example, an MPEG encoder uses information from selected frames to reduce the 
overall video data that needs to be transmitted for certain other frames. 

[0003] Typically, a video frame can be encoded in one of three ways, as an 
intraframe, as a predicted frame, and as a bi-directional frame. In addition, a video frame can 
also be skipped in order to reduce the resulting file size or bit-rate. An intraframe typically 
contains the complete image data for that frame and so does not rely on image data from 
other frames. Intraframe encoding provides the least compression. A predicted frame 
generally contains just enough information to allow a decoder to display the frame based on a 
recent preceding intraframe or predicted frame. This means that the predicted frame contains 
the data that relates to how the image has changed from the previous frame and residual error 
correction data. A bi-directional frame is generated from information from the surrounding 
intraframe(s) and/or predicted frames, including residual error correction data. Using data 
from the surrounding frames, the decoder uses interpolation to calculate the position and 
color of each pixel. 

[0004] The MPEG-4 standard was developed for use with both low and high bit 
rate apphcations. For example, MFEG-4 has been enhanced for use in interactive video 
games, videoconferencing, videophones, interactive storage media, multimedia mailing, 
wireless multimedia and broadcasting applications. MPEG-4 provides for object scalability, 
improved error robustness and enhanced compression. 

[0005] The ever-increasing demand for multimedia communications via the 
wired/wireless Internet faces the challenge of packet loss as well as bandwidth fluctuation. 
The dependency between image frames makes the compressed video stream vuhierable even 
to a small number of lost packets. MPEG-4 has therefore been particularly enhanced for use 
in low bit rate (< 64 kbs), error prone apphcations, such as mobile, wireless apphcations, and 
error-prone ATM (asynchronous transfer mode) network apphcations. Mobile operation 
tends to be more susceptible to transmission errors as there is often less data redundancy, in 
order to reduce bit rates, and greater sources of "noise." For example, wireless channels can 
be corrupted by environmental noise, and in the case of mobile applications, by burst noise 
resulting from multipath fadmg and shadowing caused by buildings and other structures. 



With respect to ATM network applications, cells can be lost due to network congestion and 
buffer overflow. 

[0006] MPEG-4 has enhanced error resiliency as compared to previous versions 
of MPEG so that video data can be more successfully transmitted over such error prone 
networks. For example, one error resihency technique provided for by the MPEG-4 standard 
is the use of resync markers in the video bit-stream. In particular, MPEG-4 has adopted fixed 
interval synchronization and specifies that video object plane (VOP) start codes and 
resynchronization markers (i.e., the start of a video packet) appear only at legal fixed interval 
locations in the bitstream. This helps to avoid the problems associated with start codes 
emulations. Through the use of resync markers included by an encoder in the video data, 
synchronization lost after an error can be regained by a decoder. 

[0007] Another error resiliency technique provided for by the MPEG-4 standard 
is the use of a reversible variable-length code. This code can be decoded even when read 
backwards, enabling a decoder to use uncomipted information from a newly found resync 
marker back to the point in the data where the error occurred. Still another error resiliency 
technique adopted by MPEG-4 is data partitioning, used to separate motion information from 
texture information using a second resynchronization marker inserted between motion and 
texture information. Thus, if there is an error and the texture information is undecodable or 
lost, the decoder can utilize the motion information to conceal the error by using the motion 
information to compensate the previous decoded frame or VOP, 

[0008] However, despite the use of the enhancements described above, many 
MPEG-4 encoders and decoders fail to provide sufficient error resiliency as is often desired 
in error-prone applications, such as in cellular phone applications. Thus, the transmission of 
MPEG-4 compliant video streams over cellular networks often results in unrecoverable 
corrupted data and the significant degradation in the quality of the video data seen by a 
recipient. Such video degradation can make video communication over error-prone networks 
undesirable for a user's perspective, and disadvantageously impedes the adoption and use of 
video transmissions over error-prone networks. 

[0009] Further, to meet target bit rates, conventional encoders drop frames to 
reduce the frame rate according to a simple skipping algorithm. For example, a conventional 



encoder will drop every 4 of 5 frames in a video clip to convert the video clip from a 30 
frames per second rate to a 6 frames per second rate. However, this simple form of skipping 
often has a significant adverse impact on the visual quality when decoded. 

Summary of the Invention 

[0010] The present invention is related to video encoding and in particular to 
systems and methods for encoding video information for transmission in a compressed 
manner and/or an error resihent manner. Embodiments of the present invention 
advantageously enable the transmission of video information even in low-bit rate, high noise 
environments. For example, embodiments of the present invention enable video transmission 
to be successfiiUy performed over cellular networks and the like. 

[0011] An enhanced Adaptive Intra Refresh (AIR) process provides efficient error 
resiUency by selectively and adaptively encoding macroblocks in a predicted frame. In one 
embodiment, an adaptive motion area process is performed to determine which macroblocks 
are to be intracoded. Adaptive motion change detection can efficiently reduce the large 
propagation error, including errors occurring in the motion area. An intercode distortion 
value and an intracode distortion value are calculated, as are an intercode bit quantity and an 
intracode quantity. Based on a comparison of the calculated intercode distortion value and 
the intracode distortion value, and on a comparison of the intercode bit quantity and the 
intracode bit quantity for each macroblock, a decision is made as to which predicted frame 
macroblocks are to be intracoded. 

[0012] One embodiment of the present invention is a method of performing 
adaptive intra refresh, the method comprising: receiving a packet loss probability value; 
receiving a motion vector for a first macroblock in a first firame; mapping the motion vector 
onto portions of a plurality of other macroblocks in a previous frame; calculating at least a 
first transition factor value based at least in part on the mapping; calculating an estimated 
inter distortion value for the first macroblock based on at least the packet loss probabihty 
value and the first transition factor value; calculating an estimated intra distortion value for 
the fixst macroblock based on at least the packet loss probability value; receiving a first bit 
quantity value corresponding to a bit quantity used to intracode the first macroblock; 



receiving a second bit quantity value corresponding to a bit quantity used to intercede the 
first macroblock; and based at least in part on the estimated inter distortion value, the 
estimated intra distortion value, the first bit quantity value, and the second bit quantity value, 
transmitting the first macroblock as one of an intra macroblock and an inter macroblock. 

[0013] Another embodiment of the present invention is a method of selectively 
intracoding macroblocks, the method comprising: receiving a packet loss probabiUty value; 
receiving a motion vector for a furst macroblock in a first firame; based at least in part on the 
motion vector, determining which portions of macroblocks in a previous firame would be 
used in predicting the first macroblock; calculating at least a first propagation strength value 
based at least in part on determining which portions of macroblocks in the previous fi-ame 
would be used in predicting the first macroblock; calculating an estimated inter distortion 
value for the first macroblock based on at least the packet loss probability value and the first 
propagation strength value; calculating an estimated mtra distortion value for the first 
macroblock based on at least the packet loss probabiUty value; calculating a quantization 
distortion value for the first macroblock; receiving a first bit quantity value corresponding to 
a bit quantity used to intracode the first macroblock; receiving a second bit quantity value 
corresponding to a bit quantity used to intercode the first macroblock; and based at least iti 
part on the estimated inter distortion value, the estimated intra distortion value, the 
quantization distortion value, the first bit quantity value, and the second bit quantity value, 
providing for transmission the first macroblock as one of an intra macroblock and an inter 
macroblock. 

[0014] Still another embodiment of the present invention is a method of 
selectively intracoding macroblocks in a plurality of macroblocks in a first fi-ame, the method 
comprising: receiving a packet loss probabiUty value; receiving a correspondiag motion 
vector for each macroblock in the plurality of macroblocks; based at least in part on the 
corresponding motion vector, determining which portions of macroblocks in a previous firame 
would be used in predicting said each macroblock; calculating for each of said macroblocks 
at least a first corresponding propagation strength value based at least in part on deteraiining 
which portions of macroblocks in the previous firame would be used in predicting said each 
macroblock; calculating for each of said macroblocks an estimated inter distortion value 



based upon at least the packet loss probability value and the corresponding at least first 
propagation strength value; calculating for each of said macroblocks an estimated intra 
distortion value based upon at least the packet loss probability value; calculating for each of 
said macroblocks a quantization distortion value; and based at least in part on the estimated 
inter distortion, the estimated intra distortion, and the quantization distortion values for each 
macroblock in the plurality of macroblocks, designating a subset of the plurahty of 
macroblocks to be intracoded. 

[0015] One embodiment of the present invention is a circuit configured to 
selectively intracode macroblocks, the circuit comprising: a first instruction configured to 
receive a packet loss probabiUty value; a second instruction configured to receive a motion 
vector for a first macroblock in a first frame; a third instruction configured to determine, 
based at least in part on the motion vector, which portions of macroblocks in a previous 
firame would be used in predicting the first macroblock; a fourth instruction configured to 
calculate at least a first propagation strength value based at least in part on the determmation 
of which portions of macroblocks in the previous frame would be used in predicting the first 
macroblock; a fifth instruction configured to calculate an estimated inter distortion value for 
the first macroblock based on at least the packet loss probabiUty value and the first 
propagation strength value; a sixth instruction configured to calculate an estimated intra 
distortion value for the first macroblock based on at least the packet loss probability value; 
and a seventh instruction configured to selectively provide for transmission the first 
macroblock as one of an intra macroblock and an inter macroblock based at least in part on 
the estimated inter distortion value and the estimated intra distortion value. 



Brief Description of the Drawings 
[0016] Embodiments of the present invention will now be described with 
reference to the drawings summarized below. These drawings and the associated description 
are provided to illustrate example embodiments of the invention, and not to limit the scope of 
the invention. 
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[0017] Figure lA illustrates an example networked system for implementing a 
video distribution system. 

[0018] Figures IB-C illustrate an example encoder architecture in accordance 
with an embodiment of the present invention. 

[0019] Figures 2A-B illustrate an example refresh map and an example scan order 
that can be used with an embodiment of the present invention. 

[0020] Figure 3 illustrates an example analysis of a video sequence used to locate 
a scene change. 

[0021] Figures 4A-4B illustrate an example of adaptive frame skipping in 
accordance with an embodiment of the present invention. 

[0022] Figure 5 illustrates an example use of second order motion compensation. 

[0023] Figure 6 illustrates an example packetized bitstream. 

[0024] Figure 7 illustrates an example use of consecutive I-frames in accordance 
with an embodiment of the present invention 

[0025] Figures 8A-H illustrate example processes for adaptive intra refresh. 

[0026] Figure 9 illustrates an example rate control process in accordance with an 
embodiment of the present invention. 

[0027] Figure 10 illustrates an example scene level recursive bit allocation 
process. 

[0028] Figure 11 illustrates an example graph of Forward Error Correction 
overhead vs. average BER correction capabiUty. 

Detailed Description of Preferred Embodiments 

[0029] The present invention is related to video encoding and in particular to 
systems and methods for encoding video information for transmission in a compressed and/or 
an error resihent manner. Thus, embodiments of the present invention advantageously enable 
the transmission of video information even in low-bit rate, noise, error-prone environments. 
Embodiments of the present invention can be used with a variety of video compression 
standards, such as, by way of example, the MPEG-4 standard, as well as MPEG-1, MPEG-2, 
H.261, H.262, H.263, H.2634-, H.263++, and H.26L, and video standards yet to be 
developed. Aspects of the MPEG-4 standard are defined in "Coding of Audio-Visual 



Objects: Systems," 14496-1, ISO/IEC JTC1/SC29AVG11 N2501, November 1998, and 
"Coding of Audio-Visual Objects: Visual," 14496-2, ISO/EEC JTC1/SC29AVG11 N2502, 
November 1998, and the MPEG-4 Video Verification Model is defined in "MPEG-4 Video 
Verification Model 17.0", ISO/IEC JTC1/SC29AVG11 N3515, Beijing, China, July 2000, 
which are incorporated herein by reference in their entirety. 

[0030] Figure lA illustrates a networked system for implementing a video 
distribution system in accordance with one embodiment of the invention. An encoding 
computer 102 receives a video signal, which is to be encoded to a relatively compact and 
robust format. The encoding computer 102 can correspond to a variety of machine types, 
including general purpose computers that execute software and to specialized hardware. The 
encoding computer 102 can receive a video sequence from a wide variety of sources, such as 
via a satellite receiver 104, a video camera 106, and a video conferencing terminal 108. The 
video camera 106 can correspond to a variety of camera types, such as video camera 
recorders, Web cams, cameras built into wireless devices, and the like. Video sequences can 
also be stored in a data store 110. The data store 110 can be internal to or extemal to the 
encoding computer 102. The data store 110 can include devices such as tapes, hard disks, 
optical disks, and the like. It will be understood by one of ordinary skill in the art that a data 
store, such as the data store 110 illustrated in Figure lA, can store unencoded video, encoded 
video, or both. In one embodiment, the encoding computer 102 retrieves unencoded video 
firom a data store, such as the data store 110, encodes the unencoded video, and stores the 
encoded video to a data store, which can be the same data store or another data store. It will 
be understood that a source for the video can include a source that was originally taken in a 
fihn format. 

[0031] The encoding computer 102 distributes the encoded video to a receiving 
device, which decodes the encoded video. The receiving device can correspond to a wide 
variety of devices that can display video. For example, the receiving devices shown in the 
illustrated networked system include a cell phone 1 12, a personal digital assistant (PDA) 1 14, 
a laptop computer 1 16, and a desktop computer 118. The receiving devices can communicate 
with the encoding computer 102 through a communication network 120, which can 
correspond to a variety of communication networks including a wireless communication 



network. It will be understood by one of ordinary skill in the art that a receiving device, such 
as the cell phone 112, can also be used to transmit a video signal to the encoding computer 
102. 

[0032] The encoding computer 102, as well as a receiving device or decoder, can 
correspond to a wide variety of computers. For example, the encoding computer 102 can be 
a microprocessor or processor (hereinafter referred to as processor) controlled device, 
including, but not limited to a terminal device, such as a personal computer, a workstation, a 
server, a client, a mini computer, a main-frame computer, a laptop computer, a network of 
individual computers, a mobile computer, a palm top computer, a hand held computer, a set 
top box for a TV, an interactive television, an interactive kiosk, a personal digital assistant, 
an interactive wireless communications device, a mobile browser, a Web enabled cell phone, 
a personal digital assistant (PDA) or a combination thereof. By way of example, an encoder 
computer may also be included in the camera 106, the cell phone 112, the PDA 114, the 
laptop computer 116, and/or the desktop computer 118. The computer 102 may fiirther 
possess input devices such as a keyboard, a mouse, a trackball, a touch pad, or a touch screen 
and output devices such as a computer screen, printer, speaker, or other input devices now in 
existence or later developed. 

[0033] The encoding computer 102, as well as a decoder computer, can 
correspond to a uniprocessor or multiprocessor machine. Additionally, the encoder and 
decoder computers can include an addressable storage medium or computer accessible 
medium, such as random access memory (RAM), an electronically erasable programmable 
read-only memory (EEPROM), masked read-only memory, one-time programmable 
memory, hard disks, floppy disks, laser disk players, digital video devices. Compact Disc 
ROMs, DVD-ROMs, other optical media, video tapes, audio tapes, magnetic recording 
tracks, electronic networks, and other techniques to transmit or store electronic content such 
as, by way of example, programs and data. In one embodiment, the encoding and decoding 
computers are equipped with a network communication device such as a network interface 
card, a modem, Infra-Red (IR) port, a wireless network interface, or other network 
connection device suitable for connecting to a network. Furthermore, the computers execute 
an appropriate operating system, such as Linux, Unix, Microsoft® Windows® 3.1, 



Microsoft® Windows® 95, Microsoft® Windows® 98, Microsoft® Windows® NT, 
Microsoft® Windows® 2000, Microsoft® Windows® Me, Microsoft® Windows® XP, 
Apple® MacOS®, IBM® OS/2®, Microsoft® Windows® CE, or Palm OS®. As is 
conventional, the appropriate operating system may advantageously include a 
communications protocol implementation, which handles all incoming and outgoing message 
traffic passed over the network, which can include a wireless network. In other 
embodiments, while the operating system may differ depending on the type of computer, the 
operating system may continue to provide the appropriate communications protocols 
necessary to establish communication links with the network. 

[0034] Figure IB illustrates an example encoding system lOOB in accordance 
with an embodiment of the present invention. The term encoding system, as used herein, 
includes one or more encoders. The encoding system lOOB comprises, by way of example, 
one or more of processors, program logic, or other substrate configurations representing data 
and instructions, which operate as described herein. In other embodiments, the encoding 
system lOOB can comprise controller circuitry, integrated circuits, gate arrays, application 
specific circuits, processor circuitry, processors, general purpose single-chip or multi-chip 
microprocessors, digital signal processors, embedded microprocessors, microcontrollers and 
the like, executing software code, including instructions and data stored in computer readable 
memory. By way of example and not limitation, the encoding system lOOB can be housed in 
one or more leaded, leadless, or ball grid array semiconductor packages, on one or more 
circuit boards, and/or using one or more hybrid packages. All or portions of the encoding 
system lOOB may be included in a fixed terminal, such as a desktop computer, or in a 
portable terminal, such as a cellular phone, portable computer, personal digital assistant, 
video camera, or the Uke. The encodmg system lOOB can, in an example embodiment, 
correspond to the encoding computer 102. By way of fiirther example, an encoding system 
in accordance with the present invention can be used to conduct video conferencing, to aid in 
the storage and transmission of movies or other images, and the like. 

[0035] The encoding system lOOB encodes and compresses video information for 
transmission to a decoder. The encoding system lOOB includes a preprocessing module or 
curcuit 102B, a bit allocation module or circuit 104B, and an encoder module or circuit 106B. 



-10- 



The preprocessing module or circuit 102B, including a video sequence analyzer, is used to 
detect when a scene change has taken place and to determine how a given frame, VOP or 
picture, is to be encoded. 

[0036] A video object layer contains a sequence of 2D representations of arbitrary 
shape at different time intervals that is referred to in MPEG-4 as a video object plane (VOP). 
Each of the VOP regions can be non-rectangular and may correspond to particular image or 
video content of interest, such as physical objects within a scene. Video object planes 
(VOPs) are divided into macroblocks of size 16x16. A macroblock is encoded in six blocks, 
four for luminosity and two for chromaticity, of size 8x8. To obtain a macroblock structure 
from an arbitrary shaped VOP, the bounding box of the VOP is calculated and extended to 
multiples of the macroblock size. 

[0037] However, for most current applications, and in particular for wireless 
applications using the so-called "simple profile," there is generally only 1 VOP per frame, 
which is a rectangular VOP. For clarity, the term frame, as used herein, can also include a 
VOP, such as an MPEG-4 VOP, or a picture. Similarly, the term VOP, as used herein, can 
also refer to a frame. In MPEG-4, the VOPs can be structured in groups of video object 
planes (GOV). Using MPEG-2 terminology, frames or pictures can be arranged in groups of 
pictures (GOPs). For clarity, the term "scene" as used herein, may also refer to a GOV or a 
GOP and visa versa. 

[0038] A frame or video object may be encoded as an intracoded frame (an "I- 
frame" or "I- VOP"), as a predicted frame (a "P-frame" or "P-VOP"), or as a bi-directional 
frame (a "B-frame" or "B-VOP"). MPEG-1 also provides for a D-frame. A D-frame is a 
frame that has no motion vectors, so that a zero vector is assumed, and has texture DCT data. 
To exploit spatial redundancy, a Discrete Cosine Transformation (DCT) is performed on the 
encoded frames and the resulting coefficients are quantized. 

[0039] The MPEG-4 simple profile does not support B-frames or B-VOPs. 
However, the simple profile does support frame skipping. A video frame can be skipped in 
order to reduce the resulting file size or bit-rate. Because the MPEG-4 simple profile does 
not support B-frames or D-frames, the following discussions will not focus on such firames. 
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Nonetheless, embodiments of the present invention can be used with B-frames and D-firames 
in accordance with other profiles and other standards, 

[0040] The term frame can correspond to either an interlaced firame or to a non- 
interlaced frame, i.e., a progressive frame. In an interlaced frame, each firame is made of two 
separate fields, which are interlaced together to create the fi-ame. Such interlacing is not 
performed in a non-interlaced or progressive frame. While illustrated in the context of non- 
interlaced or progressive video, one or ordinary skill in the field will appreciate that the 
principles and advantages described herein are applicable to both interlaced video and non- 
interlaced video. In addition, while embodiments of the invention are described in the 
context of MPEG-4, aspects of the principles and advantages described herein are also 
applicable to other video standards, including, by way of example, MPEG-1, MPEG-2, 
H.261, H.262, H.263, H.263+, H.263++, and H.26L, as well as video standards yet to be 
developed. 

[0041] An intracoded I-frame typically only includes information from the image 
itself and thus an I-fi:ame can be decoded independently of other fi-ames, P and B fi-ames are 
also referred to as intercoded frames because they are encoded based on data firom other 
firames. The preprocessing module 102 generates a file, referred to as an input firame-type 
file, containing the frame-type designations corresponding to the frames. In other 
embodiments, the firame-type information is passed to other portions of the encoding system 
lOOB using variables and the like. While the preprocessing module 102B is illustrated in 
Figure IB as being included in the encoding system lOOB, the preprocessing module 102B 
can be physically separate from the other portions of the encoding system lOOB. In such an 
embodiment, the preprocessing module 102B can produce a text file that includes frame-type 
designation that is then input by the remainder of the encoding system lOOB. 

[0042] Many standard MPEG-4 encoders can only handle one scene, that is, one 
I-frame followed by P-frames or B-frames, or they introduce regular I-frames every k frames, 
as is commonly done in MPEG-2 encoding. These approaches make implementation of the 
encoder simpler, however they place the burden on the user to determine how clips of 
multiple scenes are to be handled. To adequately improve the coding efficiency, the number 
of I-frames should be reduced or minimized. In the absence of error conditions, I-frames are 
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preferably used in scene-changes only. Thus, it is advantageous to correctly and accurately 
detect scene changes. 

[0043] An example scene change detection process in accordance with an 
embodiment of the present invention will now be described. In the example embodiment, the 
process operates on YUV-4:2:0 files and produces a text file as an output. In the example 
embodiment, YUV-4:2:0 files are header-less files with concatenated firames, where, for each 
frame, the (luminosity) Y-pixel values are provided first, followed by the (Chromaticity-blue) 
Cb-values, and then the (Chromaticity-red) Cr-values. The term "4:2:0" indicates that 
chromaticity values are subsampled by a factor 4 with respect to luminosity. In particular, if 
the size of a fi-ame (in pixels) is W x H (W: width, H: height), there are W*H Y-values (1 for 
each fi-ame pixel), (W/2)*(H/2) Cb-values and (W/2)*(H/2) Cr-values for each fi-ame. That 
gives a total of 3*W*H/2 bytes as the frame buffer size needed to store a firame of size W x 
H. The subsampling for chromaticity components is achieved by subsampling along the 
vertical and horizontal dimensions by a factor of 2. Thus, a 2 x 2 block has 4 luminosity 
values, and 1 chromaticity-blue and 1 chromaticity-red. In other embodiments, other formats 
for storing image data can be used. 

[0044] The preprocessing module 102B will now be described in greater detail. 
The preprocessing module 102B performs frame evaluation and encoding designation. As 
will be described below, each firame is designated by the preprocessing module 102B as an I- 
fi-ame, a P-frame, or as a skipped frame. In other embodiments, the preprocessing module 
102B may also designate firames as B-fi-ames or D-fi-ames. B-frame encoding may be 
performed if there is sufficient computational power, available bandwidth (B-fi:ames take 
much more bandwidth than skipped fi-ames), and if allowed by the corresponding standard. 
For example, the MPEG-4 simple-profile syntax, used in wireless networks, does not allow 
for B-fi-ames. The example file format generated by the preprocessing module 102B includes 
a line per input firame, with a fi:ame-type designation character on each line: 0, 1 or 2. A "0" 
indicates an I-frame, a "1" indicates a P-fi^e, and a "2" indicates a skipped firame. In other 
embodiments, designations can be provided for a bidirectional frame and a D-firame. 

[0045] As previously discussed, scene change fi-ames are generally intracoded. 
To locate the scene change fi*ames, the preprocessing module's scene change analysis 
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perfoms a color-weighted Root Mean Squared (RMS) calculation and a Mean Absolute 
Differences (MAD) calculation between the frame F. and the k'^ frame . The RMS 
can be defined as: 
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Equation 1 



[0046] where F(jr,y) denotes the {x,y)^ pixel in frame F, and iv and A are the 
width and height of the frame, respectively. Y(x, y) indicates the luminance value, while 
U(x, y) and V(x, y) are the two chromaticity components. The coefficients , and are 
weighting coefficients for tiie luminosity, chromaticity-blue and chromaticily-red 
components correspondingly. To ease computations, the weighting coefficients can be fixed. 
For example, the weighting coefficients can be set as follows: = = =1. 

[0047] The Mean Absolute Difference (MAD) measure can be defined as follows: 

1 w h 

MAD{F,M = TY.YJ[Y,{x,y) - Y, (x,y)\ Equation 2 

[0048] In this example, the MAD does not need to include the two chromaticity 
components. 

[0049] If MAD{F.,Ff^) and/or RMS{F^,F^) are large or greater than a selected 
criteria, this indicates that the content of F^ is substantially different from , Thus in one 
embodiment, if the MAD between consecutive frames, MAD(F^_^,F.) , is larger than a pre- 
specified threshold, F. is designated a scene change frame. An example threshold value for 
designating a scene change frrnie is approximately 25. 

[0050] Optionally, a second temporal derivative of the RMS can be used to 
determine if a frame is scene change frame, as follows: 

Equation 3 
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[0051] As defined by Equation 3, the second temporal derivative of the RMS is 
based on the RMS value for the previous frame F^.j relative to the current frame F^, the RMS 
value of the current frame Fj relative to the next frame Fi+j, and the RMS value of the next 
frame F^+i to the subsequent frame Fi+2. 

[0052] The second temporal derivative of the RMS value will be negative with 
relatively high ampUtude when is a scene-change frame, as illustrated in Figure 3. Thus, 
if the absolute value of the second temporal derivative of the RMS value is larger than a pre- 
specified threshold, is designated a scene change frame. As illustrated in Figure 3, there 
is a correlation between the RMS values, indicated by the diamonds, and the second 
derivative of RMS, indicated by the triangles. Thus, both the RMS values the values of the 
second derivative of RMS generally provide a correct indication of a scene change. An 
example second derivative of RMS threshold value for determining a scene change is -6.5. 

[0053] While the second derivative of the RMS is a good peak detector, it is 
somewhat sensitive to noise. To better increase the accuracy of the scene change 
determination, in one embodiment only if both the temporal activity measures of MAD and 
the second-order derivative of RMS indicate that the corresponding thresholds are met or 
passed, then a frame is designated a scene change frame. As will be described below, scene 
changes frames will be intracoded as I-frames or I-VOPs. 

[0054] In particular, in one example a frame is designated as a scene change, and 
thus will be coded in INTRA mode, when its MAD is greater than 20 and the second 
derivative of RMS is negative and has an absolute value of greater than 4. In another 
example, a frame is designated as a scene change, and thus will be coded in INTRA mode, 
when its RMS is greater than 40 and/or when the second derivative of RMS is negative and 
has an absolute value of greater than 8. In other embodiments other thresholds can be used. 
Alternatively or in addition, a second derivative of MAD can be used, as similarly described 
above with respect to the second derivative of RMS, as a fiirther indication of whether a 
frame corresponds to a scene change or not. 

[0055] An additional criterion can be used to determine when a scene change has 
occurred. For example, in one embodiment, a determination is made as to whether the MAD 
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value is a local maximum, that is, has increased from a previous frame to the frame at issue, 
and then decreased from the frame at issue to the next frame. If so, this indicates that it is 
likely the frame at issue is a scene change frame and should be intracoded. In addition, a 
similar determination may be made for the RMS value. For example, a determination is 
made as to whether the RMS value is a local maximum, that is, has increased from a previous 
frame to the frame at issue, and then decreased from the frame at issue to the next frame. If 
so, this too indicates that it is likely the frame at issue is a scene change frame and should be 
intracoded. 

[0056] Optionally, a voting process can be used, wherein if at least two of the 
RMS, the second derivative of the RMS, and the MAD, meet corresponding criteria, then a 
frame is designated as a scene change that is to be intracoded. In another embodiment, if the 
RMS and second derivative of the RMS meet the corresponding criteria, and if the MAD is a 
local maximum, then the frame is designated as a scene change frame. In still another 
embodiment, if the RMS and MAD meet the corresponding criteria, and if the MAD is a 
local maximum, then the frame is designated as a scene change frame. 

[0057] To further improve coding efficiency and to meet target bit rates, the 
number of frames that needs to be encoded per second is preferably reduced as much as 
acceptable. One technique used to reduce the nimiber of frames encoded per second is to 
skip-frames in the encoding process. Two example frame-skipping techniques are fixed 
frame skipping and adaptive frame skipping. Conventional encoders drop frames to reduce 
the frame rate according to a simple skipping algorithm. For example, a conventional 
encoder will drop every 4 of 5 frames in a video clip to convert the video clip from a 30 
flames per second rate to a 6 frames per second rate. 

[0058] As will be discussed below, fixed flame skipping tends to provide better 
error resiUency in noisy environments, while adaptive frame skipping tends to provide better 
visual results in low noise environments. Optionally, the preprocessing module 102B 
calculates, based on the bit-rate/frame-rate formula defined in Equation 7 below, the target 
encoding frame-rate and then switches between adaptive and fixed skipping in order to meet 
a target encoding frame rate. 
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[0059] In fixed frame skipping, the input video frame sequence is subsampled 
along the time axis, by keeping 1 in every k frames, where k is the subsampling factor. For 
example, if: 

k = 5 

and the original video sequence has a frame rate = 25 frames-per-second (^s); then 
the subsampled sequence has a frame rate = 5 :^s. 

[0060] In adaptive frame skipping, as in fixed frame skipping, the input video 
fi^me sequence is subsampled along the time axis in order to achieve a desired or 
predetermmed average frame rate. However, rather than skip-fi^es in a fixed manner, using 
adaptive frame skipping the rate of frame skipping can be irregular and can vary along the 
sequence length. Preferably, low activity frames are identified and skipped, and scene- 
change frames are kept and intracoded. Non-scene changes having some activity frame are 
intercoded. Because the skipped frames are intelligently selected based on changes on visual 
activity, the visual result when reproduced by the decoder will be better than with fixed frame 
skipping, assuming no or relatively few errors occur. 

[0061] In one embodiment, the preprocessing module 102B codes skipped frames 
using a "not___coded" bit-flag or indicator set in the video object plane (VOP) header in an 
MPEG bit stream. An MPEG-4 video packet starts with the VOP header or the video packet 
header, followed by motion_shape_textureO, and ends with next_resync_marker() or 
next_start_code). Thus, a VOP specifies particular image sequence content and is coded into 
a separate video object layer by coding contour, motion and texture information. 

[0062] In another embodiment, a skipped frame is skipped altogether, without 
inserting VOP-header information in the bitstream. A skipped frame may be recreated by a 
decoder using interpolation or by repeating a previous frame. The decoder may perform 
interpolation by pixel averaging between a preceding frame and a subsequent decoded frame, 
weighted by their time difference. 

[0063] The information regarding the video sequence frame rate is normally 
carried in the Video-Object-Layer (VOL) header. In particular, a parameter, referred to as 
vop_time_increment_resolution, determines the number of time units for each encoding 
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cycle. The vopjimejncrement value in the Video-Object-Plane (VOP) header carries the 
time stamp for each frame. 

[0064] The vop_time_increment_resolution may be, by way of example, a 16 bit 
unsigned integer value. For example, for 25 frames per second (Q)s), 
vop_time_increment_resolution == 25, and vop_time_increment cycles through the values 
0,..24. For 7.5 ^s, vop_timeJncrement_resolution=75, and vop_timeJncrement cycles 
through the values 0, 10, 20, 30, 40, 50, 60, 70, 5, 15, 25, 35, 45, 55, 65, Thus, successfiil 
decoding of the first 2 frames of a sequence can yield the exact frame rate. But, given the 
error-prone environment of wireless communication or other error-prone environments, the 
successfiil reception and decoding of any 2 consecutive frames cannot be guaranteed. Thus, 
the decoder may incorrectly determine the frame rate of the entire sequence. For that reason, 
in one embodiment, the "fixed_vop_rate" flag is set by the encoder module 106B in the VOL 
header, which then provides (via the value of fixed_vop_time_increment) the default frame 
rate. This technique better ensures the successfiil decoding or determination of the frame rate 
upon the successfiil decoding of the VOL header. The fixed_vop_time_increment value can 
later be stored in a global variable of the decoder, which will use the value to determine 
whether certain frames need to be interpolated or not. The frames to be interpolated can 
either be frames skipped by the encoder, or lost during transmission. Thus, the error- 
resihence performance of the MPEG-4 decoder will be enhanced because it will decode the 
correct number of frames, thereby avoiding loss-of-sync problems with the audio stream. 

[0065] As discussed above, for error-prone environments, error resiliency can be 
enhanced by utilizing fixed frame skipping rather then adaptive frame skipping. Fixed frame 
skipping enables the decoder to better determine when a firame has been dropped or skipped. 
Another approach to enhancing error resiliency is to use adaptive frame skipping, but provide 
a VOP-header with the not_coded flag set for a skipped frame. One drawback of this 
approach is that it results in a slight increase in bit rate due to the more frequent VOP- 
headers. 

[0066] Figure 4A illustrates one example process 400 of adaptive frame skipping. 
The process is iterative in that a frame is selectively dropped from the sequence of frames by 
computing a mean of absolute differences between the frames adjacent to the frame of 
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interest, and by weighting the computation with a temporal parameter, wherein the frame 
having the least impact on the scene is dropped. This procedure is repeatedly iterated until a 
target frame rate, which is related to the desired bit rate and frame size, is achieved. 

[0067] Beginning at start state 402, the process 400 proceeds to state 404, The 
desired frame rate is set or specified. The desired frame rate may be user specified or may be 
dynamically determined. Proceeding to state 406, the cost fimction, or adverse impact, that 
would result from dropping a particular frame is calculated for each frame between the first 
and last frame in a scene. As described in greater detail below, the cost ftinction can be based 
at least in part on the mean absolute differences (MAD) between frames closely or most 
closely bracketing or bounding the particular frame of interest, or on the sum of the mean 
absolute differences (SMAD). Altematively or in addition, the cost fimction can be based on 
sums of RMS (SRMS) for frames bracketing the particular frame of mterest. 

[0068] Proceeding to state 408, the frame associated with the lowest cost, that is, 
having the least adverse impact on visual quality, is skipped or dropped. At state 410 a 
determination is made as to whether the remaining frames will allow the target frame rate to 
be met. If the target frame rate can now be met, the adaptive frame rate skipping process 400 
proceeds to the end state 414. Otherwise, the process 400 proceeds to state 412, and a 
remaining frame having the lowest cost will be dropped. The cost of all the frames 
remaining between the first and last frames may be recalculated at state 412 based on the 
firames that are currently remaining, and the frame with the lowest cost will be dropped. The 
process 400 repeats states 410 and 412 until the target frame rate has been met, or the number 
of aheady consecutively skipped frmnes is at a specified maximum. While the process 400 is 
described with respect to selecting which firames to skip, the process 400 can be similarly 
used to decide what frames should be bi-directionally encoded. 

[0069] One example embodiment of an adaptive frame skipping selection process 
using the weighted sum of mean absolute differences will now be described in greater detail. 
Given an input sequence of a frame rater^^^ and the desired frame rate r^^ (where 
^orig > ^des\ video sequence analyzer skips frames in a "greedy" way, that is, until r^^^ is 
met. The weighted sum of MAD and the time difference between two frames are used in 
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specifying a cost function for identifying the next frame to be skipped. The frame whose cost 
function meets predetermined criteria or is minimum among remaining candidates for 
skipping is skipped. 

[0070] By way of example, referring to Figure 4B, if using the greedy approach 
-^3' Ps> and Fg have already been skipped, is now considered as a candidate for the 
next skipped frame. In order to determine if is to be skipped, a cost function is calculated 
assuming that F^ has been skipped. Upon skipping F^ , F^ and F^ will be the left and right 
frames bounding the skipped segment of F^-F^. The cost function is then defined as: 

29 97 

[0071] Cxisi=MAD{F^,F,) + X^^TD Equations 



r . 

ong 



[0072] Where the term 

29.97 



r . 

ong 



is used to nortnahze the original frame rate r^^g with respect to the NTSC frame rate of 29.97 

frames/second and where TD denotes the time difference measure. Of course other or 

different normahzations may be used as well. 

[0073] Thus, in this example TD is 5 (=7-2), and A is a weighting coefficient. In 

this example, the weighting coefficient value = 5,0, determined experimentally, provides a 

suitable result. Alternatively, the weighting coefficient value may be dynamically 

determined. At the current state, the frame whose cost fimction is minimum among the 

candidates is skipped. This process is iteratively performed until the desired frame rate is 

met. The time difference measure TD can be modified such that no more than n consecutive 

frames are skipped. For example, if a predetermined maximum number of consecutive 

frames that can be skipped simultaneously is set to 4, TD can be modified as follows 

(TD if TD<5 
TD'= \ ^ 

[ 00 otherwise 

where oo is infinity 

[0074] Optionally, the complexity for the MAD calculation can be reduced, 
though the accuracy will be reduced, if only even (or only odd) coordinate pixels are used in 
the calculation. For example: 
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= -7r^EZK(2^'2>;)-};(2x,2;;)| Equation 6a 

Wl Z X /2 / Z 

[0075] Another example process to adaptively determine which frame to skip 
estimates the total induced distortion, both spatial and temporal, for each candidate frame for 
skipping, and then skips the frame whose absence would result in the least distortion. As 
described below, the process utilizes sums of MAD (SMAD) or sums of RMS (SRMS). 
Advantageously, all the MADs do not have to be recalculated. Instead, the already calculated 
appropriate MADs are summed differently, depending on which frame is being considered 
for skipping. 

[0076] In one embodiment, the process is performed as follows. Assume that 
is the frame currently being considered for skipping. For example, assume F. is frame F^ in 
Figure 4B. Then let F. indicate the previous non-skipped frame, F^ in this example, and let 

frame indicate the next non-skipped frame, F-j in our example. Then the cost fimction can 
be defined as follows: 
Cost = 

29 97 

\SmD^^XFpF,)^SmD^^,{Fj,F,)-SMAD^,^^^^ 

r . 

Equation 6b 

[0077] where SMAD^^iF^.F^^) is the estimated spatial distortion when skipping 
frames (i+1), ... (k-1). 

[0078] As can be seen from Equation 6b, the cost function subfracts out the 
contribution from previously skipped frames. The distortion is calculated as follows: 

SMAD^XFi^F,) = <p{k -i-iy f^MAD(F._,,Fj) 

Equation 6c 

[0079] where <p(n) is a coefficient that depends on the number (n) of consecutive 
skipped frames and takes into account how much, on the average or based on a statistical 
sampHng, interpolated frames at the decoder are different from the original frames. The 
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following table provides example coefficient values determined experimentally using typical 
video sequences: 



n 


0 


1 


2 


3 


4 


5 


6 


>=7 


(pin) 


0.0 


0.746 


0.886 


1.010 


1.198 


1.276 


1.348 


1.500 



Adaptive Frame Skipping Coefficient Table 



[0080] As can be seen, (p{n) increases as the number of skipped frames increases, 
[0081] To reduce the computational complexity and resources needed, optionally 
only the MAD between consecutive firames (j-1 and j) is used to estimate the distortion of a 
complete segment of skipped frames, as in Equation 6b above. The additional distortion 
when skipping a certain firame is calculated and the temporal component is added. 

[0082] In one embodiment, a user specifies the desired encoding firame rate. The 
desired firame rate can be based on the video sequence statistics, such as temporal and spatial 
complexity, firame size, frame rate and target bit rate or compression ratio. In another 
embodiment, a heuristic can be used to select the desired firame rate. The following is au 
example heuristic equation for calculating the firmae rate, normalized with respect to the 
Quarter Common Intermediate Format (QCIF) frame size, containing 144 lines and 176 
pixels per line: 



J, , target bit rate in kbps ^ 
Encoding frame rate = — ° — . — Equation 7 

g 176*144 

V w*h 

[0083] where w and h are the frame dimensions. In order for the encoding frame 
rate to remain within reasonable bounds, the encoding frame rate should preferably be in the 

range of : 

[1 up to, and including, the source frame rate] 
[0084] In addition, in order to make adaptive frame skipping independent of the 
temporal complexity of a specific scene, the weighting coefficient is optionally set equal to 
the average RMS or MAD of the designated entire sequence. 
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[0085] Thus, to determine which frames are scene change frames, and to 
adaptively select which frames are to be skipped, the video sequence analyzer is provided 
with the video sequence, the frame width, frame height, the source frame rate, the target bit 
rate, and the setting of the error resiUence flag in the following format: 

<input file> <width> <height> <source frame rate> <target bit rate> <error resilience 

flag> 

[0086] The error resihence flag is set by the user to switch between adaptive 
frame slcipping, which has less error resiliency but a better visual result when there are no or 
few errors, and fixed frame skipping, which provides better error resiliency with a lesser 
visual result. 

[0087] In one embodiment, as illustrated in Figure IC, an RMS circuit 102C is 
used to calculate RMS values as described above, a Second Derivative of RMS circuit 104C 
is used to calculate the second derivative of RMS as described above, a MAD circuit 108C is 
used to calculate the MAD values as described above, a SUM OF MAD circuit 1 IOC is used 
to calculate the SUM of MAD values as described above, and a Second Derivative of MAD 
circuit 114C is used to calculated the Second Derivative of MAD as described above. An 
Evaluator circuit 112C coupled to the outputs of the RMS circuit 102C, the Second 
Derivative of RMS circuit 104C, the MAD cfrcuit 108C, and the SUM OF MAD circuit 
HOC, and the Second Derivative of MAD circuit 114C, is used to determine when a scene 
change has occurred and what frames to skip, based on one or more of the outputs, as 
discussed above. Of course different embodiments need not include all or any portion of the 
circuits illustrated in Figure IC. 

[0088] The bit allocation module or circuit 104B will now be described. The bit 
allocation module or circuit 104B provides for bit allocation on a scene, frame, and/or 
macroblock level. The bit allocation module 104B reads the file or otherwise receives the 
information generated by the preprocessing module 102B, including the frame-type 
designations, and calculates a bit budget for each scene, GOV or GOP based on the coded 
frames. The bit allocation module determines an appropriate distribution of a fixed bit 
budget. 
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[0089] As will be described below with respect to an example implementation, a 
first intracoded firame defines a beginning of a scene. A weight is assigned to the scene based 
on the number of intracoded firames and the number of intercoded firames, where intracoded 
fi*ames are weighted more heavily then intercoded firames to account for the greater number 
of bits needed to encode an intrafi*ame. The bit allocation module distributes the fixed bit 
budget within a scene by comparing the current bit usage and target bit usage, and based on 
the comparison, adjusts a quantization parameter or step size for the current firame, 

[0090] In particular, the bit-allocation module 104B first parses the input firame- 
type file fi-om the preprocessing module 102B. The number or quantity of GOVs is then 
calculated. Based on the calculated bit budget, the encoder module 106B then encodes each 
GOV using the novel rate control process in accordance with an encoder parameter file, 
discussed below. 

[0091] For a given GOV or scene, the number of coded firames and the first and 
end firame are known. Using the following definitions, the bit budget for a GOV or scene is 
calculated using Equation 8 below: 

[0092] N^ji = number of coded P-VOPs (predicted, intercoded VOPs) in scencj or 

GOVi, 

Bi = bit budget for scene i. 

B = bit budget for a clip including one or more scenes 

N^, = number of coded fi*ames for the clip 

Ng = number of scenes in clip. Usually, in the absence of the inclusion of 
consecutive I-fi:ames for error resilience purposes, = the number of I- VOPs 
(intracoded VOPs) 
Tc = equivalent total nimiber of VOPs in clip 

[0093] An example method of determining the bit-budget for a scene or GOV is 
as follows. Bj bits are allocated for each GOV (i). In this example allocation, an assxmiption 
is made that the bit usage for one I-fi:ame or I-VOP is close to or equal to the bit-usage of ten 
P-firames or P-VOPs (Ratio_OfJJo_P = 10). However, the method is not limited to using 
the 1 : 10 ratio. Bj and T^. are then determined as follows: 
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= B * (Nei + Ratio^OfJ Jo^Py T, Equation 8 

and 

T, - + (Ratio_.Of_I_to_P - 1)*N, Equation 9 

[0094] As defined by Equation 8, the bit allocation for a given scene is based on 
the total number of frames in the scene, wherein an intracoded frame is normalized to be the 
equivalent of several predicted frames. In order to reduce the complexity and computational 
overhead, this example bit allocation formula does not take into account the spatial and 
temporal complexity of each GOV or GOP. In other embodiments, if sufficient 
computational and time resources are available, the bit allocation formula takes the temporal 
and spatial complexity into consideration to provide a still more inteUigent bit allocation for 
each GOV, 

[0095] For example, in one embodiment a two-pass encoding process is used that 
takes into account spatial and temporal complexity. The first pass detects scene changes and 
collects frame complexity. The second pass performs the actual encodmg using complexity 
guided bit allocation. 

[0096] The first pass process will now be described in greater detail. A new GOV 
or GOP is started from a scene change instance. The two-pass rate control process provides 
substantially uniform quality for each temporally segmented GOV so that quality variation is 
better limited to GOV or GOP boundaries. This approach is taken because minimizing 
quality variation, as measured by the Peak Signal to Noise Ratio (PSNR), the Root Mean 
Square Error or other image fidelity metric, among different scenes provides less benefit with 
respect to human visual perception. 

[0097] In characterizing the relative frame complexity, the complexity measure in 
accordance with one embodiment of the present invention is relatively invariant with the 
quantization parameter (QP) used. In particular, generally the bit count for non-texture 
information, such as frame headers/syntax and motion vectors, denoted by Hi, is constant or 
ahnost constant in terms of quantization parameter (QP) change. This is in contrast to the bit 
count of the texture information, which does vary with a change in QP. By way of example, 
based on the MPEG-4 VM R-Q model [MPEG4VM], if the total number of bits used for 
coding the current frame i is R^, then the texture bits Ti=R^-Hi can be represented as: 
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— =— +^ Equation 10 

[0098] where is the MAD computed with a motion-compensated residual that 
is substantially invariant with respect to the QP (i.e., Qj), and ai and are Taylor expansion 
coefficients of texture bits over QP. The coefficients aj and di^ are normally of the same 
order, that is, have similar values. As can be seen, the lower the QP, the greater then number 
of texture bits needed to encode a given frame. 

[0099] The complexity measure Cg i addresses both the motion and texture bit 
count, and is substantially QP invariant. In one embodiment, Cg i is defined by the ratio of 
the texture bit count for a given frame to the average texture bit count, and the ratio of the 
motion vector bit count for the given frame to the average motion vector bit coxmt, as 
follows: 



= ( ~ %i ) '^B '^z^ ^^(g. Equation 1 1 



[0100] where the MV^g^^ is the motion vector bit count for frame(g,i), MV^ is the 

average motion vector bit count, and - is the average texture bit count. Because the 

obtained complexity measure Cg j is substantially QP invariant, the frame complexity can be 
generated in one pass, that is, with one QP. The calculated frame complexity will be utiUzed 
in the second encoding pass as will now be described. 

[0101] As illustrated in Figure 9, the rate control process 900 in the second pass 
consists of a three level hierarchy, scene (GOV or GOP)-level bit allocation 902, frame-level 
bit allocation 904, and macroblock-level QP adjustment 906 which utilizes the obtained 
frame complexity values Cg j. 

[0102] Complexity-guided scene, GOV, or GOP, level bit allocation will now be 
described with reference to Figure 10. To assign bits to each scene, the complexity measure 

is defined for each GOV or GOP by calculating the average spatial complexity C^^^ as 
follows: 
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C^^=Y.^gjlN^ Equation 12 

[0103] Then, the GOV or GOP level recursive bit allocation process 1000 is 
appUed, as illustrated in Figure 10. At state 1002, an initialization process is performed, with 
the following assignments: 

Bit budget (bit allocation for a given time window corresponding to a certain 
number of GOVs or GOPs) = B 

The initial transmit buffer fiilhiess ==T^x R 

Start from GOV or GOP of index 1, that is, the first GOV or GOP. 

[0104] At state 1004, bits are assigned to the scene (GOV or GOP) of index g 
according to the following formula: 

Bt(g) = >.x(R/F)xN(g) + (l-A.)x xB, Equation 13 

[0105] Where: 

R = the channel rate 

F = the selected frame rate 

N(g) = the number of frames in GOV or GOP of index g 
= the weighting factor between the buffer variation and complexity demands, 
and 

defines the total scene complexity for the given time window corresponding to the 
GOVs or GOPs under consideration 

[0106] The case of A,=0 represents the bit allocation scheme that directly follows 
the frame complexity, which is preferred if the transmit buffer constraints can be met. The 
assignment X=1.0 represents the case where the bit budget is evenly distributed without 
considering the frame complexity. In this case, little pre-loading and a small capacity of the 
decoder buffer are needed as only the first frame needs to be pre-fetched. The case with 
0<X,<1.0 represents a bit-allocation tradeoff between the buffer and the quality constraints. 
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[0107] At state 1006, the buffer status is inspected with the tentatively assigned 
bit budget B^^g), if 

P,.i + B^^-(R/F) X N(g) < MarginFactorl xp^ 

where, by way of example, MarginFactorl =0.8, which provides a safe margin (0.8 of 
the maximum buffer size) for buffer regulation. 

[0108] then the allocation is accepted, and the process 1000 proceeds to state 
1008. Otherwise, the value of X is adjusted upwards by a factor, such as 0.1, and the process 
1000 proceeds back to state 1004. 

[0109] At state 1008 the buffer status is updated as follows: 

Pg = Pg-i + B,,)~(R/F)xN(,,, 

and the remaining budget is adjusted as follows: 

[0110] The process 1008 then proceeds back to state 1002 and the value of X is 
set equal to 0. The allocation for the next GOV or GOP of index g-^l is then performed. 

[0111] The GOV or GOP scene-level bit allocation advantageously suitably 

allocates the bit budget to each GOV or GOP while meeting both the buffer and the quality 
constraints. However, to obtain a constant or consistent quality within each GOV or GOP, it 
is preferable to allocate the bit budget according to frames within each GOV or GOP based 
on frame complexity and while still meeting buffer constraints. The frame level bit 
allocation process 904 illustrated in Figure 9 performs such a frame bit allocation process, 
and is similar to the process 1000 for the GOV or GOP level bit allocation. However, rather 
then using variables corresponding to a GOP or a GOV, the variables corresponding to 
frames are utihzed. Thus, for frames 

Bf = X (R/F) + (1- ) X C(g,i) X BgV(C^) Equation 14 

where: 

Bg' = running bits = (Bit allocation for current GOP - used bits for frames already 
coded) 

Cg' = running complexity = (complexity for current GOP - complexity of frames 
already coded) 
R = target bit rate 
F = frame rate 
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[0112] In one embodiment, the macroblock level QP adjustment 906 can be 
altematively performed in accordance with the following description. In the scene and the 
frame level bit allocation processes described above, a safe margin (0.8 of the maximal buffer 
by way of example) is provided for buffer regulation. To reduce computational complexity, 
in one alternative embodiment, all the macroblocks are quantized with the same quantization 
parameter (QP) using a one-pass rate control. The QP value is determined as part of the 
frame level rate control using the following iterative process: 

[0113] lfBactual(i)>l-15''Bt(i). then QP^^ = QP,+1. 

g then if 5ac^^^/fzj<0.85* Bt(i) 

O QP,,, = QPrl 

m elseQP^, = QP,. 

Ill //To ensure that QPi+i is within the valid QP range of 1-31, the 

following cUpping operation is performed 

2 QPi,,=max(QP,,„l); 

- QPHi=min(QP,,i,31). 

- « 

US 

p [0114] However, if the buffer is relatively small, then the macroblock level rate 

% control process as described below can be used, 

[0115] Suppose Nmb is the number of macroblocks in one frame, MADjc is the 
mean absolute difference of MB k, and QPj,.i is the QP for a previous macroblock. Then the 
QP for the current MB k can be in the range of [QP^.j-l QPk_i+2], as determined in 
accordance with the following rules: 



QP,-^2 if R,_, > 1.5x5,_, x(M4A-i/Err^^J ^^'^ 
QP, -1 if R,_,>h25xB,_, x{MAD,JY.7^MADJ else 
QP. = \QP,^2if R,^, < 0.67 X B,_, x(MAD,_, ij^l^^ MAD J else 
QP, + 1 if R,_, < 0.8 X B,_, xiMAD,_, ij^l^^ MAD J else 



Equation 15 



[0116] Optionally, in high noise environment, the macroblock bit allocation 
process can be disabled or not used so that the decoder can assume that the QP is the same 
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for each macroblock. This helps prevent the decoder from using the wrong QP when 
portions of a frame have been corrupted or lost. 

[0117] After determining the bit budget for each GOV or GOP, a novel rate 
control process, also referred to as IMP4, is used to meet or substantially meet the calculated 
bit budget. Conventional MPEG-4 rate control does provide adequate performance for many 
apphcations. For example, many conventional rate control processes do not explicitly 
support multiple scenes. Instead, these conventional rate control processes assume that an 
entire sequence comprises a single scene, and therefore fail to provide for satisfactory rate 
control. By contrast, an embodiment of the present invention takes into account when scene 
changes occur, and so provides enhanced rate control. 

[0118] In one embodiment of the present invention, a self-converging rate control 
processed is used to meet the bit budget of each GOV by adjusting the quantization 
parameter QP of each frame, where QP is equal to half the quantization step size. In MPEG- 
4, the quantizer par^ieter QP can have 31 values [1-31]. The rate control process determines 
the QP based on past bit usage, the number of un-coded frame and the rest bits for a given 
GOV. Thus, if the current bit usage exceeds the assigned bit budget by more than a certain 
amount or percentage, the quantization parameter, and therefore the quantization step size, 
are increased. If, instead, the current bit usage falls beneath the assigned bit budget by more 
than a certain amount or percentage, the quantization parameter, and therefore the 
quantization step size, are decreased. The process can be performed in accordance with the 
following example pseudo-code: 

if (current bit usage > assigned bit budget *Marginl) 

QP,,,,, - min (31, INT(QPcur+StepUpPrcnt* QPcur)); 
Else if (current bit usage < assigned bit budget*Margin2) 

QP^ext max (1, INT(QPcur-StepDwnPrcnt* QPcur)); 

End if. 
Where: 

Marginl is a constant that allows the current bit usage to exceed the assigned 
bit budget, so that the system has an opportunity to stabilize. For example, 
Marginl can be set equal to 1.15, allowing the current bit usage to exceed the 
assigned bit budget by 1 5%. 



-30- 



Margiii2 is a constant that allows the current bit usage to under-run the 
assigned bit budget, so that the system has an opportunity to stabilize. For 
example, Margin2 can be set equal to 0.85, allowing the current bit usage to 
under-run the assigned bit budget by 1 5%. 

StepUpPrcnt is constant related to how much the quantization parameter is to 
be increased. For example, StepUpPrcnt may be set equal to 0.1. 

StepDwnPrcnt is constant related to how much the quantization parameter is 
to be increased. For example, StepDwnPrcnt may be set equal to 0. 1 . 

[0119] After encoding each GOV, the actual bit usage B^^^ is compared with the 
pre-assigned bit budget B^^^, and if the actual bit usage varies by more then a certain amount 
or percentage (MarginS, Margin 4) from the budgeted bit usage, the quantization parameter 
QP for an I-frame (QPI) is adjusted up or down by a certain amount (StepUp, StepDwn) or 
percentage as needed. The quantization parameter adjustment process can be performed in 
accordance with the following example pseudo-code: 



If(B^t > Margins * Bass) 

QPI = QPI + StepUp; 
Elseif(B3,t<Margin4*B^,) 

QPI = QPI -StepDwn; 

End if 

Where, by way of example: 
Margins = 1.2 
Margin4 = 0.8 
StepUp = 2 
StepDwn = -2 

[0120] The initial value of QPI may be set to 10, while the initial value of QP for 
a P-VOP may be set to 12, Note that when the quantization parameter QPI for I-VOP is 
changed, the QP assignment of the following P-frames may also change. 

[0121] In one embodiment, the encoder module 106B then performs adaptive 
motion change detection to efficiently reduce large propagation errors. In particular, adaptive 
intra refresh (AIR) is used to reduce error propagation m an MPEG data stream by the 
selective intra-coding of macroblocks in p-frames. Thus, AIR is used to help determine how 
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many macroblocks should be intra-encoded in the detected motion region of a frame. While 
the performance of macroblock intra-refresh increases error resilience of the compressed 
bitstream, increasing the number of intra-coded macroblocks correspondingly increase the 
number of bits used to encode these macroblocks. Further, if there is a fixed bit rate, the 
quantization error has to increase for the other, non-intracoded, macroblocks. Thus, 
preferably, bandwidth and the bit error probability (BER) are taken into accoimt to determine 
the percentage or number of macroblocks that are to be intracoded. 

[0122] In addition, the encoder module 1 06B optionally used Cyclic Intra Refresh 
(CIR) to encode a predetermined number of macroblocks (MBs) in each frame. Thus, CIR 
provides periodic intra-refresh to constrain possible error propagation. 

[0123] In one embodiment. Cyclic Intra Refresh (CIR) and Adaptive Intra 
Refresh (AIR) are performed as follows. The number of Intra macroblocks in a VOP is 
specified by the user in the encoder parameter file. The number of Intra macroblocks in a 
VOP depends on target bit rate, frame rate, bit buffer usage, channel noise feedback, and 
other transmission related parameters. The encoder module 106B estimates the amount of 
motion for each macroblock and selects heavy motion area to be encoded in INTRA mode to 
enhance error resiUency. The results of the estimation are recorded a refresh map at the 
macroblock level. 

[0124] An example conventional refresh map 202 is illustrated in Figure 2A. The 
encoder module 106B refers to the refresh map and selectively determines whether to encode 
a given macroblock of the current VOP in INTRA mode or not. The estimation of motion is 
performed by comparing SAD (Sum of the Absolute Difference) and SAD^. SAD refers to 
the Sum of the Absolute Differences value between the current macroblock and the 
macroblock in the same location of the previous VOP. Advantageously, the SAD is already 
calculated when performing motion estimation. Therefore, the SAD calculation does not 
have to be repeated as part of the AIR process. SAD^j, is used as a threshold value in 
determining whether a given macroblock is a motion area. If the SAD of the current 
macroblock is larger then SAD^, this macroblock is regarded as motion area. 

[0125] Once a macroblock is regarded as a motion area, it remains as a candidate 
motion area until it is encoded in Intra mode a predetermined number of times. In the 
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example refresh map 202 illustrated in Figure 2A, the value for this "predetermined number 
of times" is set "1/' in other embodiments, the predetermined number of times can be set 
equal to 2, or a higher value. Horizontal scannijig is used to resolve among macroblocks that 
are candidates to be encoded in Intra mode within the moving area as illustrated in map 204 
in Figure 2B. 

[0126] With reference to Figure 8A, conventional AIR processing, as applied to 
four sample sequential VOPs, will now be explained in greater detail. The AIR refresh rate, 
that is, the fixed number of Intra macroblocks in a VOP is preferably determined in advance. 
In this example, the number of Intra macroblocks in a VOP is set to "2" . 

[0127] [1] 1st VOP - Figure 8A [a] and [b] 

[0128] The first VOP is a scene change frame containing elements 802, 804. 
Therefore, all macroblocks in the 1st VOP are encoded in Intra mode, as illustrated in Figure 
8 A [a]. As illustrated in Figure 8 A [b], the refresh map is set to "0", where a 0 indicates that 
an Intra refresh is not to be performed and a 1 indicates that an Intra refresh is to be 
performed, because the 1st VOP is encoded without reference to a previous VOP. 

[0129] [2] 2nd VOP - Figure 8A [c] - [f] 

[0130] The 2nd VOP is intercoded as a P-VOP. Elements 802, 804 have moved 
down one macroblock and to the right by one macroblock. Intra refresh is not performed in 
this VOP, because all values in the refresh map are still zero, as illustrated in Figure 8 A [c]. 
The encoder module 106B estimates motion of each macroblock. If the SAD for a given 
macroblock is larger than S AD^,, the given macroblock is regarded as motion area, illustrated 
by the hatched area in Figure 8A [e]; thus, the refresh map is updated as illustrated in Figure 
8 A [f], where the refresh map entry corresponding to a motion macroblock is set to 1 . 

[0131] [3] 3rd VOP - Figure 8A [g] - [k] 

[0132] Elements 802, 804 have moved down by an additional macroblock and to 
the right by an additional macroblock. When the 3rd VOP is encoded, the encoder module 
106B refers to the Refresh Map illustrated in Figure 8A [g]. If the refresh map indicates that 
a macroblock is be Intra refreshed, the macroblock is encoded in Intra mode, as illustrated by 
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the macroblocks containing an "X" in Figure 8 A [h]. The corresponding refresh map value 
for an intracoded macroblock is decreased by 1 as illustrated in Figure 8A [i]. 

[0133] If the decreased value is 0, the corresponding macroblock is not regarded 
as a motion area. Proceeding, the processing is substantially the same as that for the 2nd 
VOP as illustrated in Figure 8A D>[k], where if the SAD for a given macroblock is larger 
than SAD^, the given macroblock is regarded as motion area. The refresh map is updated as 
illustrated in Figure 8A [k], where the refresh map entry corresponding to a motion 
macroblock is set to 1. 

[0134] [4] 4th VOP - Figure 8A [1] - [p] 

[0135] The processing is substantially the same as for the 3rd VOP. If a current 
macroblock has a 1 associated with it in the refresh map, it is encoded in Intra mode as 
illustrated by the macroblocks containing an "X" in Figure 8A [m]. The corresponding 
refresh map value for an intracoded macroblock is decreased by 1 as illustrated in Figure 
8A[n]. 

[0136] If the decreased value is 0, the corresponding macroblock is not regarded 
as a motion area. If the SAD for a given macroblock is larger than SAD^^, the given 
macroblock is regarded as motion area. The refresh map is updated as illustrated in Figure 
8A[p], 

[0137] In another embodiment, a novel enhanced AIR process is performed as 
follows to select which macroblocks are to be intracoded in a predicted frame. An intercode 
distortion value and an intracode distortion value are calculated, as are an intercode bit rate 
and an intracode bit rate. Based on a comparison of the calculated intercode distortion value 
and the intracode distortion value, and on a comparison of the intercode bit rate and the 
intracode bit rate for each macroblock, a decision is made as to which macroblocks are to be 
intracoded. The enhanced AIR process will now be described in greater detail. 

[0138] In order to select which prediction frame macroblock is to be intracoded, 
the expected distortion that would result if the macroblock were lost or corrupted is 
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estimated. For a predicted or intracoded macroblock, the distortion can be reduced if the 
reference macroblock in the prediction is intracoded. 

[0139] With reference to Figure 8B, recursive tracking in conjunction with the 
prediction path can be used to determme the expected distortion of the macroblock. The 
dashed lines 804B to 818B, 806B to 820B, 820B to 826B, 812B to 822B, 814B to 824B, 
822B to 828B, and 828B to 830B, indicate motion vectors (MV) that are part of the encoded 
bitstream from a macroblock in the previous frame to a macroblock in the current frame. The 
angled solid lines, such as those from 802B to 818B, 806B to 820B, 810B to 822B, 816B to 
824B, 818B to 826B, 824B to 828B, and 826B to 830B, indicate a zero-motion vector, where 
a lost motion vector is set to zero. Zero motion vectors are used by a decoder in an error 
condition, wherein the decoder, for error concealment, replaces an unrecoverable macroblock 
with a corresponding macroblock from a previous frame. Note that this is just one of the 
available error conceahnent strategies, which is termed "basic concealment." Other 
concealment strategies, such as temporal concealment or supplemental motion vector 
concealment can be performed as well. Optionally, the effects of these other error 
concealment strategies are considered separately, in parallel, when performing the recursive 
tracking. Referring to Figure 8B, "p" is the packet loss probability or rate, and q = (1-p), 

[0140] While encoding a given current macroblock, the encoder module 106B 
performs a motion search on the previous frame and locates a macroblock that most closely 
matches the current macroblock or is otherwise determined to be a good prediction frame. 
This located macroblock from the previous frame, depicted by a non-hatched circle, such as 
macroblocks 802B, 806B, 810B, 816B, 818B, 820B, 826B, is called a prediction 
macroblock. After the motion search is performed, a residual error is calculated and further 
encoded using the Discrete Cosine Transform (OCT), then quantized using a selected 
quantization step or quantization parameter (QP), and entropy coded using variable length 
coding (VLC). The encoded bitstream consists of motion vector information, entropy coded 
quantized DCT coefficients for the residual error, and corresponding header information, 

[0141] When the decoder receives the encoded bitstream, the decoder processes 
the coded information and reconstructs the macroblocks. When information for a 
macroblock is missing, which may be due to packet loss or other error conditions, the 
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decoder preferably conceals the corresponding macroblock using one or more error- 
concealment strategies, such as the basic concealment discussed above. As discussed above, 
when a macroblock is missing, basic concealment copies a macroblock at the same spatial 
location from the previous frame. This is equivalent to receiving a zero-motion vector and 
zero DCT coefficients. 

[0142] In order to determine which macroblocks should be intracoded, in one 
embodiment the encoder system 100 includes corresponding decoder circuitry so that it can 
mimic the decoder process and reconstruct what the decoder will reconstruct both in the 
absence of errors, and in the presence of one or more errors, such as a single error affecting 
just the current macroblock ("MBC")- By way of example, the difference between the error- 
free reconstruction and the reconstruction assuming one error is termed "concealment error" 
or EC. EC is defmed as follows: 

[0143] EC = MBQ-MBC Equation 16 

[0144] Where MBQ is the error free reconstruction, and MBC is a single error 
reconstruction 

[0145] When a given macroblock is used as a prediction macroblock for the next 
frame, an error present on the given macroblock will propagate to those macroblocks in the 
next frame that use the given macroblock for prediction purposes, even when there is no 
further error in motion vectors and DCT coefficients for those next-frame macroblocks. The 
mechanism v^th which error propagates from a macroblock in a given frame to other 
macroblocks in the next frame is termed "the error propagation model." 

[0146] Error attenuation occurs when half-pixel accuracy is used for prediction 
either in the vertical or horizontal direction or in both the vertical and the horizontal 
directions. Error attenuation, comparable to a low pass filter, occurs as a result of the low- 
pass frequency characteristic of the pixel averaging operation applied when half-pixel motion 
is used. Thus, given the concealment error EC calculated at the encoder system lOOB, the 
propagated error via half-pixel motion in the horizontal direction ECh/2, the propagated error 
via half-pixel motion in the vertical direction ECv/2, and the propagated error via half-pixel 
motion in the horizontal and vertical direction EChv/2, can be determined. 
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[0147] Half pixel interpolation is illustrated in Figure 8D, showing integer pixel 
locations, half-pixel locations in the horizontal direction, half-pixel locations in the vertical 
direction, and half-pixel locations in the horizontal and vertical dimension. 

[0148] The half-pixel averaging filter that is normally applied to pixel values can 
be appUed to the concealment error, EC, to define four types of propagated error arrays: 

[0149] ECO = EC 

[0150] ECh/2 = error through horizontal half-pixel motion 
(value calculated on crosses "x" in Figure 8D) 

[0151] ECv/2 = error through vertical half-pixel motion 
(value calculated on diamonds in Figure 8D) 

[01 52] EChv/2 = error through horizontal and vertical half-pixel motion 
(value calculated on squares in Figure 8D) 

[0153] For each of the four error arrays, the corresponding energy, which 
approximates the error variance under the hypothesis of zero mean, is calculated. 

[0154] The four error variances for these four cases can correspondingly be 
defined as: 

(Equation 17) 

[0155] The following four transition or strength factors can then be defined as: 

2 2 2 2 

Tec =%-1 = 0, =^-1, r.^^ =^-1 and y^,, =^-1 Equation 18 

^Ec <^v/2 C7hv/2 

which correspond to the four possible cases of motion for the current macroblock. These 
quantities are saved, together with the motion vector that is used to encode the current 
macroblock, (/w^,m^), the initial error energy cr|^, and the coding mode (Intra/Inter), in a 
table, file or other record. 

[0156] The half-pixel horizontal and vertical propagation strength can be 
approximated as follows: 

rhv/2 = rh/2 + rv/2 + rhn/vn Equation 19 

[0157] using the transition factors of half-pixel horizontal and half-pixel vertical 
motion, thereby reducing the computation time and resources needed to calculate half-pixel 
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horizontal and vertical propagation strength or transition factor. In addition, the propagation 
strengths should be positive. Therefore, a negative propagation strength result will be 
rounded or set to zero. 

[0158] As illustrated in Figure 8E, a motion vector MV can map macroblocks in 
the current frame Frame^, aligned with a grid of 16-pixel rows and columns, into 16x16 
pixels in the predicted frame Framej^_i that are not necessarily aUgned on the same grid. 
Indeed, as illustrated in figure 8E, a macroblock in Frame^can map to portions of up to four 
macroblocks in the predicted frame Frames.!. 

[01591 An error present on one or more of the four possible macroblocks from the 
previous frame used for prediction for a macroblock in the current frame will be reflected in 
the macroblock in the current frame. The error relationship can be proportional to the 
overlap area. For example the error relationship can be proportional or based on the number 
of pixels that they overlap. Thus, for each macroblock in the current frame, the up to four 
prediction macroblocks are identified that would be used when encoding the macroblock in 
Inter mode. Using the corresponding motion vector information, the overlapping area is 
determined, and a weighting factor equal or related to that area is used to normaUze the 
overlap area to the total macroblock area, 256 (=16x16) for example, as defined by the 
following equation: 

that estimates the expecting distortion on macroblock J in current frame due to macroblock i 
in previous frame. Note that /^ j is one of the transition factors /^^ ^ Yhii -> Yvn Y}^n 
defined previously, depending on the type of motion (half or integer pixel motion vector) 

along horizontal, vertical or both directions and >i;(z,j) - ^^'^^ jg the weighting factor 

256 

relating the area of overlap (wl x hi) between macroblock j and macroblock /. The term 
alij) is the concealment error cr^^ for macroblock i. 

[0160] With reference to Figure 8B, beginning with the macroblock 830B in 
current Frame^, there are two macroblocks in Frame„_i that may be used by a decoder to 
recreate macroblock 830B, a macroblock 826B used for normal decoding, and a macroblock 
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828B used for concealment. Each of the macroblocks 826B, 828B in Frame„.i may 
correspond to up to 4 aligned macroblocks, as discussed above. The same "decode or 
conceal" strategy can be recursively applied for the two macroblocks 826B, 828B in Frame^,i 
to locate 4 macroblocks 818B, 8243, 822B, 829B in Frame n-2, and then reach Frame^.3 
with 8 macroblocks 802B, 804B, 806B, 808B, 810B, 812B, 814B, 816B, and so on. Each of 
the 8 macroblocks in Frame^.3 has a probability of appearing in Frame„ at the current 
macroblock, if a certain series of errors/packet loss occurs during transmission. The 
probability of each one of these" 8 paths is the product of the corresponding branch 
probability (piq), where p is the packet loss probability and q = (1-;?). The probability of a 
particular path to the macroblock 83 OB occurring can be determined by multiplying the p and 
q values along the path. Thus, there exist paths that have probabihty p^ , such as those where 
two packet losses in a row occur, and a path defined by 812B-830B with probability p^ . 

[0161] Assuming a relatively small probabihty of error (e.g, 0.1), the higher-order 
(with respect to p) paths, such as those with a probability of p^ or p^ , can be neglected, and 
Figure 8B can thereby be simplified to the paths illustrated in Figure 8F. The reductions in 
paths in Figure 8B is based on an assumption that a macroblock that is to be used for 
concealment is not itself corrupted, that is, the probabihty is neglected of multiple 
error/packet loss on a certain path between two macroblocks. While this assumption may not 
always be true, it will most often be true. 

[0162] Based on this simphfied macroblock relationship, the expected distortion 
for the current macroblock in Frame^^ can be estimated using the propagation model described 
above. The expected distortion is defined using the following equation: 

D{n) = pcTlc(n) + qDXn - 1) Equation 20 

[0163] where D^n - 1) is the expected distortion for the reference macroblocks ui 

Frame„.i, as modified by the transition factors to take into account the possible half-pixel 

motion fi-om Frame^^i to Frame^^. Expanding this formula for the reference macroblock in 

Frame^i, expected distortion is defined as follows: 

D(n) = pale («) + QiP ^^^^"""^^ + nD\n - 2)) Equation 2 1 
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is one of the 4 transition factors (/^^^ Yhii y Yvn ^nd y^i^) for the reference 
macroblock in Frame ^.i, depending on the motion vector from Frame^., to Frame„. As 
similarly described above, D\n-2) is the expected distortion for the reference macroblocks 
in Framen.2 modified by the transition factors to take into accoimt the possible half-pixel 
motion from Framen.2 to Frame^.i and from Frame„.i to Frame„. Expanding this term ftirther, 
the expected distortion is defined as: 

D{n) = pal, (n) + g(p ^""^^K q{p ^ ^ ^> ^qD\n-3))) Equation22 

[0164] If Framen.3 is an I-frame or if the frame buffer is limited or restricted to 3 
frame, then D*"{n-2>) is equal to zero. Otherwise, the same procedure is recursively applied 
to previous frame macroblocks. Similarly, if an Intra macroblock is encountered during the 
recursive processes, it is assimied that the distortion is equal to pal,^ because there is no 
motion vector, and thus no error-propagation term. 

[0165] For the above distortion equations, contributions from each one of a 
maximum of 4 prediction macroblocks in each frame are summed and multipUed by the 
corresponding weighting factor that relates the ^ea of overlap between each one of these 
macroblocks with the t^get macroblock in Frame„. 

[0166] The information stored for each macroblock of previous frames can be 
utilized to calculate the expected distortion for each macroblock for the current FramCj^. Note 
that this expected distortion is due to errors in transmission and is not correlated to the 
distortion due to quantization for each macroblock. Therefore, the expected distortion term 
needs to be added to the quantization error to determine the total distortion for each 
macroblock. This total distortion is referred to as "total Inter-mode distortion," or D^^j^j^^ , as 
it relates to Inter mode encoding of macroblocks. 

[01 67] For Intra mode, the expected distortion reduces to the first term as follows: 
D{n) = pcrlc in) Equation 23 
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[0168] reflecting the expected error distortion that needs to be added to the 
corresponding Intra-mode quantization distortion in order to get the "total Intra-mode 
distortion" for the macroblock, also referred to as Djjj^ . 

[0169] A certain number of bits are needed for the Inter mode encoding and the 
Intra mode encoding, respectively referred to as ii^^ER ^ttntra • The difference between 
these bit rates, AR = Rj^j.^ - Rjj^j,^^^ , together with the difference in total distortion, 
AD = Djj^jj^ ~D INTER ^s^^ select the best coding mode. 

[0170] If the encoder has sufficient resources and capability, a full Rate- 
Distortion optimization can be performed that involves determination of an optimal 
weighting factor A to be used for evaluating a cost function for each macroblock, given by: 

C = D + ZR Equation 24 

and thus obtain the following Intra/Inter decision rule: 



[0171] Choose Intra mode, when 
<0,if M = 0 
<-A,if M>0 



• AD<0,ifM = 0 
AD 
AR 



• — >-A,ifM<0 
m AR 

Else select Inter mode 

[0172] Note that determination of the optimal parameter is optionally achieved 
by trying all possible QP and X combinations. The particular combination that results in the 
least distortion among all combinations that produce bitstreams below or optionally equal to 
the desired bit rate is then selected. 

[0173] Altematively, the encoder system 100 can first choose macroblocks that 
satisfy the first of the above listed cases (AD < 0, if M = 0), since it is applicable to all 

values of A , and also all macroblocks that satisfy the condition > 0, if AR<0, which 

AR 

automatically apphes to the third case. Then macroblocks with Ai? > 0 are grouped together 
and ordered in increasing order with respect to the ratio ^ . Similarly, the macroblocks 
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with M < 0 are grouped together and ordered in decreasing order with respect to the same 

^. AD 
ratio, — . 
AR 

[0174] This is shown by the graph illustrated in Figure 8G, representing the value 

of the ratio — for each macroblock, where "x"s indicate those macroblock with AR>0 
AR 

and "o"s indicate those with AR<0, 

[0175] The "x"s that have the most negative values, or negative values that meet 
corresponding defined criteria, and the "o"s that have the least negative values, or negative 
values that meet corresponding defined criteria, are selected as candidates for Intra Refiresh. 
Note that "o"s with a positive value have aheady been chosen for Intra coding, and ''x'^s with 
positive value are excluded altogether as they are automatically intracoded. Experimental 
results indicates that macroblocks with Ai? > 0 , indicated by an "x", are the most common 
ones, because generally Intra mode costs more, as measured in bits, compared to Inter mode. 
Thus, optionally only macroblocks with Ai? > 0 will be considered for Intra refi-esh. The so- 
called refiresh rate, specifying how many additional macroblocks are to encoded in intra 
mode, then dictates how many of the candidate macroblocks are eventually chosen. 

[0176] A less resource intensive process is to calculate the expected distortion for 
each macroblock due to transmission error, while ignoring or excluding quantization error. 
Then, the differential between the expected distortion for Intra and Inter mode can be used as 
the criterion for selecting macroblocks to be Intra coded by ordering them according to this 
criterion. 

[0177] Thus, Adaptive Intra Refiresh (AIR) can be used to help determine how 
many macroblocks should be intra-encoded in the detected motion region of a firame. AIR 
can be enabled and disabled m the encoder parameter file using an AIR bit set by a user and 
read by the encoder module 106B. When AIR is enabled, the user also specifies another 
parameter, the AIR refi-esh rate. The AIR refresh rate determines how many macroblocks 
should be intra-coded in the detected motion region of one frame. Adaptive motion change 
detection can efficiently reduce the large propagation error, even when the error occurs in the 
motion region. 
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[0178] Figure 8H illustrates an embodiment of the E-AIR process. At state 802H, 
the motion vector or vectors for the current macroblock of interest in Frame (n) are received. 
At state 804H the motion vector is used to locate which macroblocks from a previous Frame 
(n-1) are to be used in predicating the current macroblock. At state 806H a determination is 
made as to how much, in terms of area or pixels, of each of the located macroblocks in Frame 
(n-1) will be used in generating the current macroblock. At state 808H, the error variances 

(^^Ec» <3^Echv/2) calculated, including the overlap weighting (w(i,j) — - — ). 

At state 808H the propagation strength transition quantities are calculated based on the error 
variances. At state 8 12H, the Intra Error Distortion Di^tra for Frame (n) is calculated At 
state 814H, the Inter Error Distortion Dieter for Frame (n) is recursively calculated. As 
discussed above, the recursive calculation can include the error distortion and quantization 
distortion from previous frames, such as Frame (n-1). Frame (n-2), and so on, whose errors 
may propagate to Frame (n). The recursion may be limited to a predetermined number of 
frame generations, until all or a predetermined amount of the frame buffer is being used, or 
the recursion may stop when an Intra frame is reached. 

[0179] At state 816, the value DeltaD is calculated by taking the difference 
between Dj^tra and Dj^ter. or by otherwise comparing Di^tra and Dj^ter. At state 818H the 
bit quantity or bit rate Rjntra and Rinier for intracoding Frame (n) and for intercoding Frame 
(n) respectively are determined. At state 820H a comparison of Rj^jra and Rj^^r is made by 
calculating the difference DeltaR. At state 822H the decision to intracode or to intercede is 
made based on DeltaR, DeltaD and Lambda using the illustrated criteria. Alternatively, those 
macroblocks having a DeltaD may be chosen for intracoding. For example, after DeltaD is 
calculated for all of Frame (n)'s macroblocks, the two macroblocks having the largest DeltaD 
are intracoded. The macroblock intracode selection may also be based on a cost calculation 
where the Cost = Rate + lambda*D, or D + lambda*R, and choose the highest N (=AIR rate) 

[0180] Figure 8C illustrates experimental results comparing the use of Cyclic 
Intra Refresh, trace 802C, with the use of the enhanced AIR method described immediately 
above, trace 804C. The overall gain is approximately 1 dB in the PSNR. In one 
embodiment, the additional computational load is approximately 10%. 
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[0181] In order to utilize AIR more effectively, conventional Cyclic Intra Refresh 
(GIR) is combined with AIR. The number of the Intra Refresh macroblocks in a VOP is 
defined as the sxmmiation of the AIR_refresh_rate and the CIRj:efresh_rate. 
AIR_refresh_rate macroblocks are encoded in AIR mode and CIR_refresh_rate macroblock 
are encoded in the conventional CIR mode. These values are user definable in the encoder 
parameter file. When the channel degrades, higher CIR and AIR rates should preferably be 
assigned. In addition, when the distance between I-frames is large, higher CIR and AIR rates 
should preferably be assigned. These rates are preferably varied adaptively with changing 
chaimel conditions as well as with the coding parameters to improve the tradeoff between 
error resilience and coding efficiency. 

[0182] The encoder parameter file specifies many different encoding parameters, 
including those discussed above. The encoder parameter file can be used in conjunction with 
the preprocessing module output by reading the frame-type file, which specifies the encoding 
type are determined by preprocessing process described above. By way of example, the 
encoder parameter file includes fields to enable/disable AIR, CIR, and SMC, to specify the 
AIR and CIR refresh rates, and a flag used to enable or disable the inclusion of two I-frames 
at the beginning of each scene, GOV or GOP. 
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The encoder parameter file has the following parameters or fields: 



Version 


Version number and/or name 


Source.Width 


specifies frame widtii 


Source. Height 


specifies frame height 


Source .FirstFrame 


specifies the first frame to be encoded (counting from 0) 


Source.LastFrame 


specifies the last frame to be encoded 


Source.Directory 


Directory to store the original source of sequence without trailing "\" 


Source.SamplingRate 


Allows sub-sampling the original soiurce based on this san^ling rate 
parameter 


Output.Directory.Bitstream 


Output bitstream directory 


Output.Directory.DecodedFranies 


Directory to put the reconstructed frames from the encoder (encoder 
also performs decoding) 


RateControLType [0] 


What type of rate control- one of "None" (maintain constant QP), 
"MP4" (for IMP4), "TM5" (for Test Model 5); 


RateControLBitsPerVOP [0]: 


bit budget for the entire sequence 


QuantType [0] 


One of "H263", "MPEG". For low bit rate communications, H263 is 
preferred. 


GOV.Enable [0] 


GOV header present or not 


GOV.Period [0] 


Number of VOPs between GOV headers 


Texture.QuantStep.IVOP [0] 


Quantization Parameter (QP) for I-VOP; not affected by rate control 


Texture.QuantStep.PVOP [0] 


QP for P-VOP if rate control is disabled 


Texture.QuantStep.BVOP [0] 


QP for B-VOP if rate control is disabled 


Motion.PBetweenICount [0] 


In case of multiple scenes, and in the presence of a frame-type file, 
the encoder ignores this parameter. Else, the length of a GOP is 
specified before source subsampling. A negative value means one 
GOP for the whole sequence. 


Motion.ReadWriteMVs[0] 


One of "Off \ "Read", "Write" 


Motion.ReadWriteMVs.Filename [0] 


Filename for Read/write MV from/to files 


ErrorResiLRVLCEnable [0] 


Enable or disable RVLC - 0: disable; 1: enable 


ErrorResil.DataPartition.Enable[0] 


Enable or disable data partitioning 


ErrorResil.VideoPacket.Enable[0] 


Enable or disable introduction of resync markers in video packet (VP) 


ErrorResil.VideoPacket.Length[0] 


If VideoPacket enables, the size of VP in bits - select based on the 
target bit rate 


ErrorResiLSMCEnable 


Enable/disable second motion compensation (SMC). When =1, only 
1 PVOP; when =2, for afl PVOP 


ErrorResilAIR.Enable 


Enable/disable adaptive intra refresh (AIR) 


ErrorResiLCIR-Enable 


Enable/disable cychc intra refresh (CIR) 


ErrorResiLAIR.Rate 


Added for specifying the AIR rate, an AIR rate of 2, for example, 
may be used. 


ErrorResil.CIR.Rate 


When enabling CIR (see above), specifies the CIR rate (Macroblocks 
per VOP), A CIR rate of 2, for example, may be used. 


ErrorResil.2LEnable 


Added to enable/disable double I-frame coding in each GOV for 
enhanced error-resilience 



Table 1 Encoder parameter specification. 
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[0183] To further increase error resiliency, a Header Extension Code (HEC) is 
included by the encoder module 106B in every packet in a sequence of video packets or in 
every video packet, and not just on the furst video packet following the VOP header as with 
conventional encoders. This better ensures that even if a packet is lost or corrupted, 
subsequent packets can still be decoded and used. Further, even typical conventional 
decoders will be able to handle the inclusion of the enhanced use of HECs as the use of 
additional HECs is compatible with the MPEG-4 bitstream syntax. Adding a header, 
including sequence information, to all packets increases overhead by only about 40 bits per 
packet, or about 0,2%, but results in a noticeable improvement in decoding. 

[0184] Further, Second-order Motion Compensation (SMC) is optionally 
provided to enhance error resihency. The SMC process is performed by the encoder module 
106B and generates supplemental motion vectors so that each predicted frame can be 
predicted separately from two preceding frames. Sequence 502 of Figure 5 illustrates the 
SMC process, in which frame has motion vectors from both from the (k-l)^ frame and the 
(k-2)th frame. Therefore, even if the motion vectors from the (kA)^ frame are corrupted, or 
the (k-l)th frame is itself corrupted, the k^^ frame can still be predicted from the (k-2)^ frame 
using the corresponding motion vectors. Thus, by inserting the redundant motion vectors, 
also termed second-order motion vectors, from the frame to k^i frame, the scene image 
quality at the decoder-side will be better protected from transmission errors. For example, 
even if all the information for frame is corrupted during transmission, the use of SMC can 
effectively suppress error propagation by excluding k^i^ frame from being used in any later 
prediction as illustrated in Figure 5 by sequence 504. 

[0185] To perform SMC, a frame buffer is included in the encoder module 106B 
to store the previously decoded frame at time (t-2). This previously decoded frame is used to 
calculate the second order motion vectors. In one embodiment, these redundant motion 
vectors are not used in the encoder to produce residuals. The decoder uses the second order 
motion vectors when the bitstream is corrupted during the transmission and the first order 
motion vectors or corresponding frame is corrupted. Otherwise, the second order motion 
vectors need not be used in the decoder. Optionally, full, unrestricted motion search can be 
performed in order to determine these second-order motion vectors. The information 
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regarding motion between frames (t-2) and (t-1) and between frames (t-1) and (t) can be 
combined in order to estimate these second-order motion vectors. For this reason, the SMC 
data can optionally be included, via the "user data mechanism", as explained below, for the 
first P-VOP following a scene change or for each P-VOP, The advantage of having SMC on 
only the first P-VOP is that bandwidth is not wasted when there is no error, thereby providing 
better coding efficiency. However, providing SMC for every P-VOP or for many P-VOPs 
enhances the robustness and decoding ability of the decoder, especially for cases of severe 
error conditions. 

[0186] An additional video packet, referred to as "User data" or an "SMC" video 
packet, for each P-VOP is used to transmit these second-order motion vectors. This packet 
contains, in the same predictive fashion and using the same variable-length codes as in the 
standard motion vectors, a motion vector for each macroblock or selected macroblocks of the 
current P-VOP. An HEC is included in this special SMC video packet, which allows the 
SMC video packet to be decoded even if other packets for this P-VOP are lost. In one 
embodiment, this packet is positioned in the bitstream at the end of each P-VOP. A user can 
enable or disable the use of SMC by setting to 1 or 0 the corresponding option in the encoder 
parameter file. Figure 6 illustrates an example packetized bitstream showing the relative 
position of packets in the bitstream, including the SMC packet 602. 

[0187] Optionally, in order to make SMC packets comphant with the MPEG-4 
syntax, a so-called "User data start code" (hex code B2) or the like, including other unique 
identifier codes to be assigned in the ftiture by MPEG-standards committee and the like, 
precedes the HEC and motion vector information. The user data start code signals standard 
decoders not capable of using the second order motion vectors to ignore all bits following it 
xmtil the next start code in the bitstream, which in this will be a VOP start code. In one 
embodiment, the encoder includes a unique 16-bit identifier in order not to confuse the SMC 
user data extensions with data that other people may decide to include in the bitstream 
following the same convention. 

[0188] To further enhance error resiUency, two consecutive I-frames are inserted 
upon a scene change, even if the second of the two frames is not a scene change firame and 
has low enough relative motion that it would normally be intercoded, as illustrated in Figure 

-47- 



7. That is, the scene change frame 702 is intracoded, and the next frame 704 is then 
automatically intracoded. Thus, the loss of one I-frame will not prevent the decoder from 
decoding the predicted frames 706, 706, 710, 712 that are predicted from frame 702 as well 
as frame 704. Frames subsequent to the second I-frame may be encoded as intracoded 
frames, such as P-frames or B-frames. 

[0189] The use of two consecutive I-frames advantageously prevents the 
prediction of a frame in the current scene using scene content from other scenes, without 
degrading the performance of the SMC. Because the first two consecutive frames in a scene 
are intracoded, neither first nor second-order motion vectors are inserted into the I-fi:ames. 

[0190] The inclusion of the consecutive I-frames can be xmder the control of the 
preprocessing module 102B which can designate both a scene change frame and the next 
frame as intracoded frames. Alternatively, the encoder module 106B can automatically 
intracode a frame following a frame designated as an intracoded frame by the preprocessing 
module 102B. 

[0191] While the insertion of two consecutive I-frames increases the bit-budget 
and thus decreases coding efficiency for a certain transmission bit-rate, in error-prone 
environments this inefficiency is more then compensated for by the additional error- 
resilience thereby provided. A "consecutive I-frame'' flag is provided in the encoder 
parameter file that can be independent of the SMC flag. Advantageously, the presence of two 
consecutive I-frames at the beginning of each scene can be used for the decoder to conceal 
transmission errors more efficiently, even if the SMC mode is not tumed on, or when it is 
turned on for just the first P-VOP following the (double) I-frame after a scene change. 

[0192] Adaptive Intra Refresh (AIR) is optionally supported by the encoder 
module 106B as a by-product of SMC. This mode, enabled when selecting SMC for just the 
first P-VOP or for every P-VOP, encodes in INTRA mode those macroblocks that have as 
prediction macroblocks from frames (t-1) and (t-2) two significantly different macroblocks, 
as measured by the MAD distance measure. An example threshold is 20. Thus, if the MAD 
between the two prediction macroblocks for a given macroblock in the current frame is 
greater than 20, this macroblock is intracoded. 
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[0193] The encoder module 106B also performs general encoder functions, such 
as motion estimation, residual calculation, and the like. The encoder output can be stored for 
later transmission or can be transmitted in substantially real-time to a receiving terminal, 
such as a cellular phone, containing an appropriate decoder. 

[0194] To increase error resilience, in one embodiment Intra_dc_vlc_thr is set to 
"0", so that all DC coefficients are coded using DC VLC in a frame or VOP. In addition, the 
ac_pred_flag may be disabled for all Intra macroblocks. Both these options are permitted by 
the syntax and so are supported by standard decoders, and can result in higher quality for the 
case of error in transmission. This improvement can be on the order of 0.1-0.2dB in PSNR. 
In particular, when data partitioning is enabled, the DC coefficient of each 8x8 DCT block of 
an INTRA macroblock can either be coded together with the 63 AC DCT coefficients, using 
what is known as an "INTER VLC" table, or separately, using what is known as an "INTRA 
VLC" table. 

[0195] Using the INTRA VLC table results in separating the corresponding bits 
for the DC coefficient from those of the rest 63 AC coefficients as follows: 

[0196] For an I- VOP, where the macroblocks are all intracoded, the DC data bits 
are located before the DC marker (DCM), together with the header bits, while the data bits 
for AC coefficients are placed after the DC marker. 

[0197] For a P-VOP, the DC data bits are placed immediately after the Motion 
Marker (MM), together with other crucial or important information, and the data bits for AC 
coefficients follow. 

[0198] This separation of DC and AC information enhances error resiUence, since 
the DC data can be decoded and trusted even for corrupted packets, if the DCM/MM markers 
are correctly hitting during the decode process. Further, with respect to P-VOPs, Reversible 
Variable Length Code (RVLC) forward^ackward decoding can reveal at least one good first 
part. 

[0199] To control whether the DC data is coded together or separate from AC 
coefficients, the flag called "intra_dc_vlc_thr" is used for each VOP that maps, according to 
the QP value, each macroblock. Setting this value = 0 means that all macroblocks, regardless 
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of their QP should use an INTRA DC table and thus separate DC data from AC coefficients. 
This is a standard syntactic element in MPEG-4 and thus supported by standard decoders. 

[0200] The ac_pred_flag is another option that indicates whether for a specific 
block of an INTRA macroblock, the top row and first column DCT coefficients, are coded 
independently or differentially with respect to the neighboring blocks. To enhance error 
resilience it is preferable to set ac_pred_flag to 0. 

[0201] In one embodiment, error-correction is supported at the source level by 
using Forward Error Correction (FEC). In particular, Bose-Chaudhuri-Hocquenghem (BCH) 
codes, including Reed-Solomon, are supported. As is well known in the art, BCH is an error 
detection and correction technique based on Cychc Redundancy Code. For any positive 
integers m, m > 3, and t < T^'\ there is a binary BCH code with a block length n equal to - 
1 and n - k < mt parity check bits, where k is the number of information bits. The BCH code 
has a minimum distance of at least 2t + L Each binary BCH code (n, k, t) can correct up to t 
bit errors, and thus it is also referred to as a t-error-correcting code. 

[0202] Different block sizes may be used. In one embodiment, a block size of 
511 is used. FEC is performed at a packetizer level of the elementary video bitstream, which 
can be considered as source-level error correction. By contrast, channel level error-correction 
introduces redundancy at the bit-level after multiplexing. 

[0203] FEC can provide significant error resilience, at the cost of some bit 
budget. Figure 11 illustrates an example graph of Forward Error Correction overhead vs. 
average BER correction capabiUty. As illustrated, there is a close relation between FEC 
redundancy and error correcting capabiUty, which is a strong indicator of error resihence. 
Preferably, at least double the expected BER is provided for. 

[0204] In one embodiment of the present invention, rather then apply FEC to all 
packet data, a more efficient process is performed that reduces the number of error correction 
bits generated as compared to conventional approaches, while still providing significant error 
correcting capability. One embodiment of the FEC process optionally generates FEC bits 
only for selected portions of the packets, and in particular, for those portions that are 
considered more essential or important for purposes of reproducing a firame sequence by the 
decoder. In addition, the FEC process provides a systematic code, that is, the FEC correction 
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or parity bits are separate from the original imcoded data bits. Thus, even if all the FEC bits 
are lost, the original selected portions of the packet are still potentially decodable. Further, in 
one embodiment the FEC data is encoded and transmitted in an MPEG-4 compliant manner 
as explained below. Thus, even if a decoder that is not equipped to process the FEC data 
receives the FEC packet, the decoder will still be able to process the frame motion and 
texture data. 

[0205] In particular, FEC is efSciently applied to important data, such as motion 
vectors, DC coefficients and header information, and FEC bits are not generated for 
unimportant or less important data. This more important data may be located in a packet 
between a packet resync field and a motion marker. In particular, for a given frame or VOP, 
the selected bits targeted for FEC coding are concatenated together with those from other 
frame packets and the FEC code bits are generated for the concatenated bits. 

[0206] In one embodiment, rather than including the FEC bits in the same packet 
or packets as the frame data, for a given frame or VOP, the resulting FEC bits are placed in 
an additional packet after the regular frame or VOP packets to ensure MPEG compatibility. 

[0207] In addition, to better allow the decoder to recover in cases were a data 
packet is lost or has a motion marker missing, for each standard packet, in the FEC packet a 
packet identifier is stored in association with a corresponding value indicating how many bits 
and/or which bits where used to generate FEC bits* 

[0208] In order to maintain compatibihty with reference or conventional MPEG-4 
decoders, this additional FEC packet further includes a user data identifier code, 
"user_data_start_code," used to identify user defined data, and as such will be ignored by 
conventional decoders not equipped to process the FEC packet. In addition, where there is 
no error, the FEC packet will not be used by decoders equipped to handle the FEC bits. But 
when errors do occur, FEC decoding will help recover data that will allow for decoding even 
under severe error conditions. 

[0209] Thus, as described above, by providing error resiUency and enhanced 
compression, embodiments of the present invention advantageously enable the transmission 
of video information even in low-bit rate, high noise environments. For example. 
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embodiments of the present invention enable video transmission to be successfully performed 
over cellular networks and the like. 

[0210] Although this invention has been described in terms of certain preferred 
embodiments, other embodiments that are apparent to those of ordinary skill in the art are also 
within the scope of this invention. 
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